AITopics | wt 1

Collaborating Authors

wt 1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CLion: Efficient Cautious Lion Optimizer with Enhanced Generalization

Huang, Feihu, Zhang, Guanyi, Chen, Songcan

arXiv.org Machine LearningApr-17-2026

Lion optimizer is a popular learning-based optimization algorithm in machine learning, which shows impressive performance in training many deep learning models. Although convergence property of the Lion optimizer has been studied, its generalization analysis is still missing. To fill this gap, we study generalization property of the Lion via algorithmic stability based on the mathematical induction. Specifically, we prove that the Lion has a generalization error of $O(\frac{1}{Nτ^T})$, where $N$ is training sample size, and $τ>0$ denotes the smallest absolute value of non-zero element in gradient estimator, and $T$ is the total iteration number. In addition, we obtain an interesting byproduct that the SignSGD algorithm has the same generalization error as the Lion. To enhance generalization of the Lion, we design a novel efficient Cautious Lion (i.e., CLion) optimizer by cautiously using sign function. Moreover, we prove that our CLion has a lower generalization error of $O(\frac{1}{N})$ than $O(\frac{1}{Nτ^T})$ of the Lion, since the parameter $τ$ generally is very small. Meanwhile, we study convergence property of our CLion optimizer, and prove that our CLion has a fast convergence rate of $O(\frac{\sqrt{d}}{T^{1/4}})$ under $\ell_1$-norm of gradient for nonconvex stochastic optimization, where $d$ denotes the model dimension. Extensive numerical experiments demonstrate effectiveness of our CLion optimizer.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

2604.14587

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Confidence-Based Decoding is Provably Efficient for Diffusion Language Models

Cai, Changxiao, Li, Gen

arXiv.org Machine LearningMar-24-2026

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models for language modeling, allowing flexible generation order and parallel generation of multiple tokens. However, this flexibility introduces a challenge absent in AR models: the \emph{decoding strategy} -- which determines the order and number of tokens generated at each iteration -- critically affects sampling efficiency. Among decoding strategies explored in practice, confidence-based methods, which adaptively select which and how many tokens to unmask based on prediction confidence, have shown strong empirical performance. Despite this success, our theoretical understanding of confidence-based decoding remains limited. In this work, we develop the first theoretical analysis framework for confidence-based decoding in DLMs. We focus on an entropy sum-based strategy that continues unmasking tokens within each iteration until the cumulative entropy exceeds a threshold, and show that it achieves $\varepsilon$-accurate sampling in KL divergence with an expected number of iterations $\widetilde O(H(X_0)/\varepsilon)$, where $H(X_0)$ denotes the entropy of the target data distribution. Notably, this strategy yields substantial sampling acceleration when the data distribution has low entropy relative to the sequence length, while automatically adapting to the intrinsic complexity of data without requiring prior knowledge or hyperparameter tuning. Overall, our results provide a theoretical foundation for confidence-based decoding and may inform the design of more efficient decoding strategies for DLMs.

arxiv preprint arxiv, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

2603.22248

Country:

Asia > China > Hong Kong (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

6454dcd80b5373daaa97e53ce32c78a1-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 11:30:17 GMT

Wepropose twoinnovativealgorithms, DP-GLMtron and DP-TAGLMtron, that outperform the conventional DPSGD. Inlight ofthevast quantities of personal and sensitiveinformation involved, traditional methods of ensuring privacy are encountering significant challenges.

artificial intelligence, machine learning, wt 1, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > Rensselaer County > Troy (0.04)
Europe > Belgium > Flanders > East Flanders > Ghent (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Generalization Bounds for Neural Networks via Approximate Description Length

Amit Daniely, Elad Granot

Neural Information Processing SystemsFeb-12-2026, 20:01:02 GMT

Namely,thattheempirical lossofall the functions in the class is -close to the true loss. Finally, we develop a set of tools for calculating the approximate description length of classes of functions that can be presented as a composition of linear function classes and non-linear functions.

approximate description length, artificial intelligence, repd, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence (0.47)

Add feedback

OnlineConvexOptimization withContinuousSwitchingConstraint

Neural Information Processing SystemsFeb-11-2026, 20:32:50 GMT

In many sequential decision making applications, the change of decision would bring an additional cost, such as the wear-and-tear cost associated with changing server status. To control the switching cost, we introduce the problem of online convex optimization with continuous switching constraint, where the goal is to achieve a small regret given a budget on the overall switching cost. We first investigate the hardness of the problem, and provide a lower bound of orderΩ( T)whentheswitchingcostbudgetS = Ω( T),andΩ(min{T/S,T}) whenS = O( T), where T is the time horizon. The essential idea is to carefully design an adaptive adversary, who can adjust the loss function according to thecumulative switchingcostofthe playerincurredso farbasedonthe orthogonal technique. We then develop a simple gradient-based algorithm which enjoys the minimax optimal regret bound.

artificial intelligence, constraint, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

bethestateattimetandlet F(t) = [ F1(x(t)1;ξ(t)1),, Fn(x(t)n;ξ(t)n) ] > Rn d betheworkergradients attimet. DenoteY(t) andG(t) asthestate(models) andgradients respectively,ofallnodes,fromtimet τmax tot

Neural Information Processing SystemsFeb-11-2026, 18:16:09 GMT

These steps include hyper-parameter tuning, running deep learning experiments, collecting results, and generatingfigures.

artificial intelligence, machine learning, weg, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.88)

Add feedback

sup

Neural Information Processing SystemsFeb-11-2026, 17:48:04 GMT

In the deterministic setting where the data is deterministically given without any probabilistic assumptions, significant advances inDP linear regression has been made [77,57,68, 16, 7, 83, 31, 67, 82, 71]. In the randomized settings where each example{xi,yi} is drawn i.i.d. We explain the closely related ones in Section 2.3, with analysis when the covariance matrixhasaspectralgap. The resulting utility guarantees are the same as those from [23], which are discussedinSection2.3. When privacy is not required, we know from Theorem 2.2 that under Assumptions A.1-A.3, we can achieve an error rate of O(κ p V/n).

artificial intelligence, hu 2loga, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

Appendices ABernoulli-CRSProperties

Neural Information Processing SystemsFeb-11-2026, 17:45:40 GMT

Let us defineK Rn n a random diagonal sampling matrix whereKj,j Bernoulli(pj) for 1 j n. Therefore, Bernoulli-CRS will perform on average the same amount of computations as in the fixed-rankCRS. This formulation immediately hints atthe possibility tosample over the input channeldimension, similarly to sampling column-row pairs in matrices. Let ` be a β-Lipschitz loss function, and let the network be trained with SGD using properly decreasing learning rate. Let us denote the weight, bias and activation gradients with respect to a loss function` by Wl, bl, al respectively.

approx, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

e2db7186375992e729165726762cb4c1-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 14:38:09 GMT

This paper identifies astructural property of data distributions that enables deep neural networks to learn hierarchically.

artificial intelligence, byeq, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

cb77649f5d53798edfa0ff40dae46322-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 04:44:22 GMT

Optimization is akeycomponent for training machine learning models and has a strong impact on their generalization. In this paper, we consider a particular optimization method--the stochastic gradient Langevin dynamics (SGLD) algorithm--and investigate the generalization of models trained by SGLD.

artificial intelligence, machine learning, wt 1, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback